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Abstract 

The SPAI algorithm, a sparse approximate inverse preconditioning technique 
for large sparse linear systems, proposed by Grote and Huckle [SIAM J. Sci. Corn- 
put., 18 (1997), pp. 838-853.], is based on the F-norm minimization and computes 
a sparse approximate inverse M of a large sparse matrix A adaptively. How¬ 
ever, SPAI may be costly to seek the most profitable indices at each loop and M 
may be ineffective for preconditioning. In this paper, we propose a residual based 
sparse approximate inverse preconditioning procedure (RSAI), which, unlike SPAI, 
is based on only the dominant rather than all information on the current residual 
and augments sparsity patterns adaptively during the loops. RSAI is less costly to 
seek indices and is more effective to capture a good approximate sparsity pattern 
of A~^ than SPAI. To control the sparsity of M and reduce computational cost, 
we develop a practical RSAI(toZ) algorithm that drops small nonzero entries adap¬ 
tively during the process. Numerical experiments are reported to demonstrate that 
RSAI(tol) is at least competitive with SPAI and can be considerably more efficient 
and effective than SPAI. They also indicate that RSAI(to/) is comparable to the 
PSAI(toZ) algorithm proposed by one of the authors in 2009. 

Keywords. RSAI, SPAI, PSAI, sparse approximate inverse, F-norm minimiza¬ 
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1 Introduction 


Consider the iterative solution of large sparse linear system 


Ax = b, 


(l.I) 


where A is an n x n real nonsingular and nonsymmetric matrix, and 5 is a given 
n-dimensional vector. This kind of problem is a core problem in scientific and engi¬ 
neering computing. Krylov iterative solvers, such as the generalized minimal residual 
method (GMRES) and the biconjugate gradient stabilized method (BiCGStab) jl, 32], 
have been commonly used in nowdays for solving dni). However, when A has bad 
spectral property or is ill conditioned, the convergence of Krylov solvers are generally 
extremely slow 3^. In order to accelerate the convergence of Krylov solvers, one must 
utilize preconditioning techniques to improve the conditioning of so that Krylov 

solvers applied to resulting preconditioned systems converge fast. Sparse approximate 
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inverse (SAI) preconditioning has been one major class of general-purpose precondi¬ 
tioning [3, llTi . l32l |. and it aims at computing a sparse approximate inverse M ~ A~^ 
or factorized M = M1M2 ~ A~^ directly. With such an M available, the right and left 
preconditioned systems are 


AMy = b,x = My and MAx = Mb, 
respectively, and the factorized preconditioned system is 

MiAM 2 y = Mih, x = M 2 y. 


( 1 . 2 ) 


(1.3) 


We then use a Krylov solver to solve (m or depending on the way that M is 

applied. Since the coefficient matrices in the above preconditioned systems are roughly 
the identity matrix I, Krylov solvers are expected to converge quickly. 

The success of SAI preconditioning is based on the underlying hypothesis that the 
majority of entries of A~^ are small, which means that A, indeed, has good sparse 
approximate inverses. A good preconditioner M should be as sparse as possible, and it 
should be constructed efficiently and applied within Krylov solvers cheaply. There are 
two kinds of approaches to computing M. One of them gets a factorized M = Mi M2 
and applies Mi and M 2 to (m. Efficient algorithms of this kind are approximate 
inverse (AINV) type algorithms, which are derived from the incomplete biconjugation 
procedure ( 3 , 0 ]- Their stabilized and block variations are developed in 0 ]. An alter¬ 
native is the balanced incomplete factorization (BIF) algorithm, which computes an 
incomplete LU (ILU) factorization and its inverse simultaneously lOj, ll]|. The other 
kind of approach is based on F-norm minimization, which is inherently parallelizable 
and constructs M by minimizing ||AM — /||_f with certain sparsity constraints on M, 
where || ■ \\f denotes the Frobenius norm of a matrix. We will introduce more on this 
approach in the next paragraph. Kolotilina and Yeremin 0 have proposed a factor¬ 
ized sparse approximate inverse (FSAI) preconditioning procedure, which is a mixture 
of the above two kinds. FSAI has been generalized to block form, called BFSAI, in 

241. An adaptive alg orithm that generates the pattern of the BFSAI preconditioner 
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2^. For a comprehensive survey and comparison of SAI 


M can be found in ^_ __ 

preconditioning procedures, we refer the reader to 0,0. 

We focus on F-norm minimization based SAI preconditioning and revisit the SPAI 
algorithm 0 in this paper. A key of this kind of preconditioning is the determination 
of an effective approximate sparsity pattern of A~^. There are two approaches to doing 
this, one of which is static and the other is adaptive. A static SAI preconditioning 
procedure first prescribes a sparsity pattern of M and then computes M by solving n 
least squares (LS) problems independently UM- The main difficulty of this approach 
is how to choose an effective approximate sparsity pattern of A~^. A lot of research has 
been done on this issue, and some priori envelope patterns for effective approximate 
sparsity patterns of A~^ have been established; see, e.g., 0, 18, 21|. For a general 


irreducible sparse A, however, these envelope patterns are often quite dense, so that 
it is expensive to use them as patterns of M directly. To this end, several researchers 
have proposed and developed adaptive procedures, which start with a simple initial 
sparsity pattern and successively augment or adjust it until either the resulting M 
satisfies a prescribed accuracy, or the maximum loops are performed, or the maximum 
number of nonzero entries in M is reached. Such idea was first advocated in 1^. Grote 
and Buckle 19| have proposed the SPAI algorithm aimed at augmenting the sparsity 
pattern of M adaptively by adding the small number of most profitable indices at each 
loop . It has been generalized to block form, called BSPAI, in [l|. Gould and Scott 
[20| have given a number of enhancements which may improve the performance of the 
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SPAI algorithm. Chow and Saad 1^ have put forward a minimal residual based (MR) 
algorithm that uses the sparse-^arse iteration with dropping strategies. SPAI is more 
robust than the MR algorithm [^. Motivated by the Cayley-Hamilton theorem, Jia and 
Zhu have proposed an adaptive Power SAI (PSAI) preconditioning procedure and 
developed a practical PSAI(tol) algorithm with tol dropping tolerance, which has been 
shown to be at least competitive with SPAI and can outperform SPAI considerably for 
some difficult problems. Jia and Zhang have recently established a mathematical 
theory on dropping tolerances tol for PSAI(toZ) and all the static F-norm minimization 
based SAI preconditioning procedures. Based on the theory, they have designed robust 
adaptive dropping criteria. With the criteria applied, PSAI(toZ) and the mentioned 
static SAI preconditioning procedures can make M as sparse as possible and as equally 
effective as the possibly much denser one generated by the basic PSAI or the static SAI 
procedures without dropping small entries. 

Remarkably, the unique fundamental mathematical distinction of all the adaptive 
F-norm minimization based SAI preconditioning procedures consists in the way that the 
sparsity pattern of M is augmented or adjusted. Practically, both static and adaptive 
SAI procedures must control the sparsity of M. We mention that there is a prefiltration, 
which shrinks the pattern of dro ppi rig small entries of A before implementing a 
SAI preconditioning algorithm 12, 13, 13, 341. 

Jia and Zhang [28(] have made an analysis on SPAI and PSAI(toZ) and shown why 
PSAI(toZ) can be more effective than SPAI for preconditioning (II.ip . accompanied by 
detailed numerical comparisons of the two algorithms on a lot of regular and irregular 
sparse problems arising from applications. Here the meaning of ‘regular sparse’ is that 
all columns of A are sparse, and that of ‘irregular sparse’ is that A has at least one 
relatively dense column, whose number of nonzero entries is substantially more than 
the average number of nonzero entries per column of A. Empirically and numerically, 
a column is declared irregular sparse if it has at least lOp nonzero entries, where p is 
the average number of nonzero entries per column of A [^. For A irregular sparse, 
Jia and Zhang have shown that SPAI must be costly to seek and add indices at 
each loop and, moreover, the resulting M may be ineffective for preconditioning, while 
PSAI(toZ), though also very costly, can produce an effective preconditioner M. To this 
end, they have proposed an approach that first transforms the irregular sparse dLH) 
into certain new regular ones and then uses SPAI and PSAI(to/) to construct M for 
the regular problems. Such approach greatly improves the computational efficiency 
of SPAI and PSAI(to/) as well as the preconditioning effectiveness of SPAI applied to 
irregular sparse (HU) directly. 

In this paper, suppose that the current M is not good enough, and we need to 
augment or adjust the sparsity pattern of M in order to get a better M. We will 
propose a new adaptive F-norm minimization based SAI preconditioning procedure, 
called the residual based SAI (RSAI) algorithm, and develop a practical RSAI algo¬ 
rithm with dropping criteria exploited. The RSAI algorithm may greatly improve the 
computational efficiency and the preconditioning effectiveness of SPAI, especially when 
A is irregular sparse. The basic idea of RSAI algorithm is as follows: Differently from 
SPAI, for each column ruk, k = l,2,...,n, of the current M, by only selecting a 
few dominant indices that correspond to the largest entries in the residual of we 
augment the sparsity pattern of ruk using a new approach that avoids some possibly 
expensive computation and logical comparisons in SPAI when determining the most 
profitable indices based on the whole residual of m^. As it will turn out, the RSAI 
procedure is not only (considerably) less costly to seek and add indices but also more 
effective to capture a good approximate sparsity pattern of A~^ than SPAI, which is 
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especially true for an irregular sparse A. We derive a quantitative estimate for the 
number of nonzero entries in M, demonstrating how it depends on the sparsity pat¬ 
tern of A, the number of indices exploited that correspond to the largest entries of 
the residual at each loop, and the number /max of loops. To control the sparsity of 
M and improve computational efficiency, we develop a practical RSAI (toZ) algorithm 
by making use of the adaptive dropping criterion established in , which guarantees 
that two M obtained by RSAI(to/) and the basic RSAI without dropping small entries 
have comparable preconditioning effects. We show that the positions of large entries in 
M are automatically adjusted in a global sense during the loops. It is known 


271,12 


that SPAI retains the already occupied positions of nonzero entries in M in subsequent 
loops and adds new positions of nonzero entries in M at each loop. As a result, the 
RSAI (to/) algorithm captures an approximate sparsity pattern of A~^ in a globally 
optimal sense, while the SPAI algorithm achieves this goal only in a locally optimal 
sense. This difference may make RSAI(to/) advantageous over SPAI. Numerical experi¬ 
ments will confirm that RSAI(to/) is at least competitive with and can be substantially 
more efficient and effective than SPAI, and they will also illustrate that RSAI(to/) is 
as comparably effective as PSAI(to/). 

The paper is organized as follows. In Section [21 we review the F-norm minimization 
based SAI preconditioning and the SPAI procedure, and introduce the notation to be 
used. In Section [3l we propose the basic RSAI procedure. In Section jH we make a 
theoretical analysis and give some practical considerations, based on which we develop 
a practical RSAI(/o/) algorithm with dynamic dropping strategy in exploited. Fi¬ 
nally, we make numerical experiments on a number of real-world problems to confirm 
our assertions on RSAI(to/), SPAI and PSAI(to/) in Section jS] 


2 The F-norm minimization SAI preconditioning and the 
SPAI procedure 

A F-norm minimization based SAI preconditioning procedure solves the problem 

min \\AM — I\\f, (2.1) 

MgM 

where M. is the set of matrices with a given sparsity pattern J. Denote by A4k the 
set of n-dimensional vectors whose sparsity pattern is Jk = {i \ {i,k) G ^7}, and let 
M = (mi, m 2 , ■ ■ ■ ,Tnn)- Then (|2.1I) can be separated into n independent constrained 
LS problems 

min \\Amk - BkW, k = 1,2,... ,n, (2.2) 

m^&Mk 

where || • || denotes the 2-norm of a matrix or vector, and is the k-th. column of 
the identity matrix I of order n. For each k, let Ik be the set of indices of nonzero 
rows of A{-,Jk). Define Ak = A{Ik,Jk), the reduced size vector rhk = mk{Jk), and 
Cfc = efc(Xfc). Then (12.21) amounts to solving the smaller unconstrained LS problems 

min||Afcmfc -eA;||,fc = 1,2,... ,n, (2.3) 

rrifc 

which can be solved by QR decompositions in parallel. 

If M is not goo d enough, an adaptive SAI preconditioning procedure, such as SPAI 
0 and PSAI [29l|. improves it by augmenting or adjusting the sparsity pattern Jk 
dynamically and updating rhk for k = 1,2,... ,n efficiently. We describe SPAI below. 

Denote by the sparsity pattern of after / loops starting with an initial 
pattern and by I^'^ the set of indices of nonzero rows of A{-,J'^^^). Let Ak = 
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), Cfc = and the solution of (12.31) . Denote the residual of 

by 

rk = Amk-ek, (2.4) 

whose norm is exactly equal to the residual norm ||ffc|| of (12.31) defined by fj, = Akrhk — 
efc. If Tk / 0, define Ck to be the set of indices i for which rk{i) / 0 and A4 the set of 
indices of nonzero columns of A{Ck, •)• Then 


Jk=Mk\ jP 


(2.5) 


constitutes the new candidates for augmenting J'P in the next loop of SPAI. For each 
j ^ J^k-, SPAI solves the one-dimensional problem 


min \\rk + ^jAej\\, 
k-j 


and the 2-norm pj of the new residual + pjAej satisfies 


2 II ||2 

Pj = Ikfcll 


||Ae,-P 


( 2 . 6 ) 


(2.7) 


SPAI takes la, 1 ~ 5, indices from J7fc corresponding to the smallest pj, called the most 
profitable indices, and adds them to to obtain Let Ik be the set of indices 

of new nonzero rows corresponding to the most profitable indices added. SPAI gets 
the new row indices = Ik'^ UTfc, and by it and form an updated LS 

problem (12.3p . which is cheaply solved by updating the previous rhk- Proceed in such 
way until ||rfc|| = ||Amfc — efc|| < e or / > /max, where e is a prescribed tolerance, usually 
0.1 ~ 0.4. 

Each loop of SPAI consists of two main steps, which include the selection of the 
most profitable indices and the solution of the resulting new LS problem, respectively. 
For the selection of the most profitable indices, one first determines Ck through 
and Jk through Ck and J'k, computes the pj, then orders them, and finally selects 
the most profitable indices. Clearly, whenever the cardinality of j^k is big, this step is 
time consuming. It has been shown [ 2 ^ that the cardinality of jfk is always big for 
/ = 1, 2,... , /max when the k-th. column of A is relatively dense and the initial pattern 
is that of /, causing that SPAI is very costly to select the most profitable indices. 
In addition, we can easily see that SPAI is also costly to seek the most profitable 
indices for A row irregular sparse. This is the case that an index in Ck corresponds to a 
relatively dense row of A. For detailed numerical evidence on this, we refer to 26l. l28l|. 
where it has been clearly shown that SPAI is too costly and impractical since it spends 
too much time seeking the most profitable indices when A is irregular sparse. 


3 The RSAI preconditioning procedure 

Our motivation for proposing a new SAI preconditioning procedure is that SPAI 
may be costly to select the most profitable indices and may be ineffective for precon¬ 
ditioning, which is definitely the case when A has some relatively dense columns. Our 
new approach to augmenting and 1^ is based on the partially dominant other than 
all information on the current residual rk, and picks up new indices at each loop di¬ 
rectly and cheaply that avoids possibly expensive computation and logical comparisons 
needed by SPAI. Importantly, the new SAI preconditioning procedure is more effective 
to capture a good approximate sparsity pattern of A~^. Since our procedure critically 
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depends on the sizes of entries of r^, we call the resulting procedure the Residual based 
Sparse Approximate Inverse (RSAI) preconditioning procedure. In what follows we 
present a basic RSAI procedure. 

Suppose that M is the one generated by RSAI after I loops starting with the initial 
sparsity pattern Consider the residual ||rfc|| defined by (ESD for A: = 1, 2,..., n. 

If Ikfcll < e for a prescribed tolerance e, then satisfies the required accuracy, and 
we do not improve nik further. If ||rfc|| > e, we must augment or adjust to update 
TTT-fc so as to reduce ||rfc||. 

Denote by rk{i) the i-th entry of rk, and by Ck the set of indices i for which 
'f'ki'i') 7^ 0. Heuristically, the indices corresponding to the largest entries rk{i) of in 
magnitude are the most important and dominate Ck in the sense that these large entries 
contribute most to the size of ||rfc||. Therefore, in order to reduce ||rfc|| both substantially 
and cheaply, these most important or dominant indices should have priority, that is, 
we should take precedence to reduce the large entries rk{i) of rk by augmenting or 
adjusting the pattern of nik based on the dominant indices described above. Therefore, 
unlike SPAI, as a starting point, rather than using the whole Ck, RSAI will exploit only 
the dominant subset of it and augment or adjust the pattern of ruk in a completely new 
manner, as will be detailed below. 

At loop I, denote by the set of the dominant indices i corresponding to the 
largest \rk{i)\. Let j'k be the set of all new column indices of A that correspond to 
of row indices but do not appear in jjp ■ We then add Jk to to obtain a 
new sparsity pattern Denote by Tk the set of indices of new nonzero rows 

corresponding to the added column indices Jk- Then we update 1-k^^'^ ~ 
form the new LS problem 

(3-1) 

whose solution can be updated from in the way described in fl^ . 1^ . and 

the updated ruk is a better approximation to the A:-th column of A~^. We repeat 
this process until ||rfc|| < e oi I reaches /max- The above RSAI procedure effectively 
suppresses the effects of the large entries \rk{i)\ and reduces ||rfc||. 

Now we give more insight into When choosing it from Ck in the above way, 

we may encounter If so, we cannot augment the sparsity pattern of ruk- 

In this case, we set 

(3-2) 

i=0 

and choose iz^!^ from the set whose elements are in Ck but not in TZ^k . Obviously, the 
resulting iz^k is always non-empty unless ruk is exactly the /c-th column of A~^. On 
the other hand, if Jk happens to be empty, then jj:'' = Jk~^^'^ and we just skip to loop 
I + 2, and so forth. Since rk J 0, there must exist a / > / +1 such that Jk is not empty. 
Otherwise, Jk^ is the set of indices of all nonzero columns of A{Ck, •)) and 

rk{CkfA{Ck, J-f) = rk{CkfA{Ck, •) = 0, (3.3) 

which means that rk{Ck) = 0, i.e., = 0, and ruk is exactly the A:-th column of A~^ 

since A{Ck, •) has row full rank. 

The above RSAI procedure can be described as Algorithm 1, named as the basic 
RSAI algorithm. 
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Algorithm 1 The basic RSAI Algorithm 
For /c = 1, 2,..., n compute 


1 : 

2 : 

3: 

4: 


5: 


6 

7 

8 


Choose an initial pattern of ruk, and give a user-prescribed tolerance s and 
the maximum number /max of loops. Set I = 0 and = 0. 

Determine the set of indices of nonzero rows of solve (12.31] for fhk 

by the QR decomposition of Ak = A{X^\ recover nik from fhk, and compute 

rfc = Anik - Cfc. 


while ||rfc|| > e and I < /max do 

Set Ck to be the set of indices i for which rk{i) ^ 0, sort \rk{i)\ in decreasing 
order, and let be the set of dominant indices i that correspond to a few 


(0 


largest |rfc(/)| appearing in Ck but not in TZ^j^^. Augment = TZ^j^ 

Set Jk equal to the set of all new column indices of A{TZk, •) but not in ■ Let 
ik be the set of row indices corresponding to Jk that do not appear in and 
update = if’[jXk- 

Set / = / + 1; if = 0, then go to step 3. 

For each j G Jk, update ruk and ||rfc|| using the approach in [l9|,|29|, respectively. 

end while 


Two key differences between SPAI and RSAI are clear now. Firstly, for RSAI we 
order the nonzero entries in pick up a few dominant indices izf with the largest 
entries of in magnitude that do not appear in TZf. Using Tzf, we determine the 
new indices Jk and add them to jf to get the new column indices jj:~^^\ by which 
we identify the new row indices Xk to be added and form the set X^^^^ of row indices. 
Secondly, for RSAI we do not perform possibly expensive steps ()2.6I] and (|2.7I1 and 
the ordering and sorting followed. Since izf is a subset of Ck and we assume that it 
has only a few elements, its cardinality can be much smaller than that of Ck, which is 
typically true for A irregular sparse. As a result, the determination of and Jk~^^'^ 

is less costly, and it can be substantially less time consuming than SPAI, especially when 
A is irregular sparse. 

Similar to SPAI, we need to provide an initial sparsity pattern of M for the RSAI 
algorithm, which is usually chosen to be that of I when A has nonzero diagonals. We 
also need to provide a stopping criterion e, the number of the dominant indices and 
the maximum number /max of loops. For them, we take e = 0.1 ~ 0.4, similarly to that 
used in F-norm based SAI preconditioning procedures including SPAI and PSAI. We 
suggest taking the cardinality c of izf to be 3 or so at each loop. As for outer loops 
/ ma x, we take it to be small, say 10. 

In the next section, we make some theoretical analysis and develop a more practical 
RSAI algorithm with some dropping strategy used. 

4 Theoretical analysis and a practical RSAI algorithm 

We cannot guarantee that M obtained by the basic RSAI algorithm is nonsingular 
without additional requirements. Grote and Huckle [l^ present several results, showing 
how the non-singularity of M is related to e and how the eigenvalues and the singular 
values of the preconditioned matrix AM distribute. Their results are general and apply 
to M obtained by any F-norm minimization based SAI preconditioning procedure. 
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Huckle 21| shows that the patterns of for small // are effective upper 

bounds for the sparsity pattern of M by the SPAI algorithm. Note that both RSAI 
and SPAI augment the sparsity pattern of M based on the indices of nonzero entries 
of residuals k = 1,2,... ,n. Consequently, in the same way as [21[, we can justify 
that the patterns of for small /i are also upper bounds for the sparsity 

pattern of M obtained by the basic RSAI algorithm. 

Let us get insight into the pattern of M by the basic RSAI algorithm. Obviously, 
the dominant indices at each loop and the maximum number /max of loops directly 
affect the sparsity of M. Now we present a quantitative upper bound for the number 
of nonzero entries in M and show how the sparsity of M depends on that of A, the 
number c of the dominant indices at each loop and /max- 


Theorem 4.1. Assume that RSAI runs /max loops to generate M = {mi, m 2 , ■ ■ ■ ,mn). 
Denote by gk the number of nonzero entries in A{k,:), 1 < k < n, by g = maxi<fc<„ g^, 
by c the number of the dominant indices at each loop, and by nnz{mk) the number of 
nonzero entries in m^. Then for = {k} we have 


nnz{mk) < min{fl'c/max + 1, n}, A; = 1, 2,... , n 


(4.1) 


and 

nnz{M) < min{(gc/ ma x + l)n,n^}. (4-2) 

Proof. By assumption, it is known that the number of nonzero entries added to m^ at 
each loop does not exceed gc. Therefore, we have 

nnz{mk) < gc/max + 1, 

which, together with the trivial bound nnz{mk) < n, establishes (14.111 . Note that the 
number of nonzero entries of M is uz 2 ;(mfc). (14.2p is direct from (|4.1I) . □ 

Theorem O shows that if all the rows of A are sparse, i.e., g is small, then M 
must be sparse for /max and m small. On the other hand, if A has one relatively dense 
row, some columns of M may become dense quickly with increasing /. This is the case 
once an index in at some loops corresponds to a relatively dense row of A. In this 
case, the basic RSAI is inefficient since a relatively large LS problem will emerge in the 
next loop, causing that solving it is expensive. This is a shortcoming of the basic RSAI 
algorithm for A row irregular sparse. 

Under the underlying hypothesis that the majority of the entries of A~^ are small, 
we know that whenever M becomes relatively dense in the course of construction, the 
majority of its entries must be small and make little contribution to A~^. Therefore, 
in order to control the sparsity of M and improve the efficiency of the basic RSAI 
algorithm, we should drop those small entries promptly during the process for practical 
purposes. This asks us to introduce some reasonable dropping strategies into the basic 
RSAI algorithm so as to develop practical RSAI algorithms for constructing an effective 
and sparse M. 

We should be aware that SPAI implicitly uses a dropping strategy to ensure that 
all the columns of M constructed by it are sparse. Precisely, suppose that SPAI is run 
/ ma x loops starting with the pattern of I and a few, say, la most profitable indices are 
added to the pattern of m^ at each loop. Then the number of nonzero entries of the 
final TUfc does not exceed 1 + lalmux, which is hxed and small as la and /max are both 
hxed and small. Such M may not be robust since the number of large entries in the /c-th 
column of A~^ is unknown in practice. Moreover, for a general sparse A, the numbers 
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of large entries in the columns of A~^ may have great differences, that is, good sparse 
approximations of A~^ may well be irregular sparse. This is particularly true for A 
irregular sparse and even for A regular sparse [2^. Consequently, SPAI may generate 
a poor preconditioner M since some columns of it are too sparse for given small la and 

^max' 


The above analysis suggests that we should not fix the number of large entries of 
each column of M for RSAI in advance. In the spirit of PSAI(toZ) [^, a more robust 
and general-purpose dropping strategy is to retain all the large entries of ruk produced 
and drop those small ones below a prescribed tolerance tol during the loops. This kind 
of dropping strategy should better capture a good approximate sparsity pattern of A~^ 
and produce an effective M more possibly. 

Dropping tolerances tol used in SAI preconditioning procedures had been empir¬ 
ically chosen as some small quantities, say 10“^. Recently, Jia and Zhang 27|] have 
shown that such dropping criteria are not robust and may lead to a sparser but in¬ 
effective preconditioner M or a possibly quite dense M, which, though effective for 
preconditioning, is very costly to construct and apply. Jia and Zhang [27[ have estab¬ 
lished a mathematical theory on robust dropping tolerances for PSAI(toZ) and all the 
static F-norm minimization based SAI preconditioning procedures. Based on the the¬ 
ory, they have designed robust and adaptive selection criteria for dropping tolerances, 
which adapt to the RSAI algorithm directly: an entry rrijk is dropped whenever 


\mjk\ < talk 


nnz{mk)\\A\\ 


-J = 1,2,... ,n 


(4.3) 


during the loops, where rrijk is the j-th entry of and nnz{mk) is the number of 
nonzero entries in rrik at the current loop, and || • ||i is the 1-norm of a matrix. In 
terms of the theory in [27|, the RSAI (to/) equipped with the above dropping criterion 
will generate a SAI preconditioner M that has comparable preconditioning quality to 
the possibly much denser one generated by the basic RSAI algorithm without dropping 
small nonzero entries; see [j^ . Theorem 3.5]. 

Introducing (|4.3p into Algorithm [U we have developed a practical algorithm, called 
the RSAI(to/) algorithm. As a key comparison of RSAI(to/) and SPAI, we notice that, 
for RSAI(to/), the positions of large entries of rrik are adjusted dynamically during the 
loops, while the already occupied positions of nonzero entries in rrik by SPAI retain 
unchanged in subsequent loops and we simply add a few new indices to the pattern of 
rrik at each loop. Remarkably, some entries of rrik are well likely to change from large to 
small during the loops, so that the final M may have some small entries that contribute 
little to A~^. In other words, RSAI(to/) seeks the positions of large entries of A~^ in a 
globally optimal sense, while the SPAI algorithm achieves this goal in a locally optimal 
sense. Consequently, RSAI(to/) captures the sparsity pattern of A~^ more effectively 
than SPAI. 


5 Numerical experiments 

In this section we test a number of real-world problems coming from applications, 
which are described in Table [fi. We also list some useful information about the test 
matrices in Table [TJ For each matrix, we give the number s of irregular columns as 
defined in the introduction, the average number p of nonzero entries per column and the 

^All of these matrices are either from the Matrix Market of the National Institute of Standards and 
Technology at http://math.nist.gov/MatrixMarket/ or from the University of Florida Sparse Matrix 
Collection at http://www.cise.ufl.edu/research/sparse/matrices/ 
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number of nonzero entries in the densest column. To make the efficiency comparison 
as fair as possible, we have written the Fortran codes of SPAI, RSAI(to/) and PSAI(to/). 
After M is constructed by one of them, we then apply it within Krylov solvers. We 
shall demonstrate that RSAI(toZ) works well. In the meantime, we compare RSAI(to/) 
with SPAI and PSAI(toZ), illustrating that RSAI(to/) is at least competitive with SPAI 
and can be considerably more efficient and effective than the latter for some problems, 
and it is as comparably effective as PSAI(to/) for preconditioning Ax = b. 


Table 1: The description of test matrices, where s is the number of irregular columns, 
p the average number of nonzero entries per column, and pd the number of the nonzero 
entries in the densest column. An irregular column means that the number of its 
nonzero entries exceeds lOp. 


matrices 

n 

nnz 

s 

P 

Pd 

Description 

fs.541_3 

541 

4282 

1 

8 

538 

2D/3D problem 

orsirr_l 

1030 

6858 

0 

7 

13 

Oil reservoir simulation 

orsirr_2 

886 

5970 

0 

7 

14 

Oil reservoir simulation 

shermanl 

1000 

3750 

0 

4 

7 

Oil reservoir simulation 

sherman2 

1080 

23094 

0 

21 

34 

Oil reservoir simulation 

shermanS 

3312 

20793 

0 

6 

17 

Oil reservoir simulation 

saylr4 

3564 

22316 

0 

6 

7 

3D reservoir simulation 

cavityll 

2597 

71601 

0 

27 

62 

Subsequent computational fluid dynamics problem 

ex36 

3079 

53099 

0 

17 

37 

Computational fluid dynamics problem 

e20r0100 

4241 

131556 

0 

31 

62 

Computational fluid dynamics problem 

e30r0000 

9661 

306356 

0 

32 

62 

Computational fluid dynamics problem 

e40r0100 

17281 

553562 

0 

32 

62 

Computational fluid dynamics problem 

powersim 

15838 

64424 

2 

4 

41 

Power network problem 

raefsky3 

21200 

1488768 

0 

70 

80 

computational fluid dynamics problem 

scircuit 

170998 

958936 

104 

6 

353 

Circuit, many parasitics 


We perform numerical experiments on an Intel Core 2 Quad CPU E8400@ 3.00GHz 
with 2GB memory under the Linux operating system. The computations of construct¬ 
ing M are done using Fortran 90 with the machine precision Cmach = 2.2 x 10“^®. We 
take the initial sparsity pattern as that of I for SPAI and RSAI(toZ). We apply row 
Dulmage-Mendelsohn permutations to the matrices having zero diagonals so as to make 
their diagonals nonzero 1^. The related Matlab commands are j = dmperm(A) and 
A = A(j ,:). We applied demperm to cavityll, ex36, e20r0100, e30r0000 and edOrOlOO. 
We use the M generated by RSAI(to/), SPAI and PSAI(to/) as right preconditioners, 
and use BiCGStab as the Krylov solver, whose code is from Matlab 7.8.0. The initial 
guess on the solution of Ax = b is always xq = 0, and the right-hand side b is formed 
by choosing the solution x = (1,1,..., 1)^. The stopping criterion is 


where y is the approximate solution obtained by BiCGStab applied to the precondi¬ 
tioned linear system AMy = b. 

In all the tables, e, Zmax and c stand for the accuracy requirements, the maximum 
loops that RSAI(toZ) allows, and the numbers of the dominant indices exploited by 
RSAI(to/), respectively, spar = denotes the sparsity of M relative to A, iter 

stands for the number of iterations used by BiCGStab, ric is the number of columns of 
Af whose residual norms do not drop below e, and ptime and stime denote the CPU 
timings (in seconds) of constructing M and of solving the preconditioned linear systems 
by BiCGStab, respectively, f indicates that convergence is not attained within 1000 
iterations, :j; indicates that the Krylov solver stagnates and — means that we do not 
count CPU timings when BiCGStab fails to converge within 1000 iterations. 
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5.1 The effectiveness of the RSAI(toZ) algorithm 


First of all, we illustrate that RSAI(to/) can capture an effective approximate spar¬ 
sity pattern of A~^ by taking the 886 x 886 regular sparse matrix orsirr_2 as an example. 
We take the number c = 3 of the dominant indices exploited at each loop, the prescribed 
accuracy e = 0.2 and /max = 15. We have found that all columns of M satisfy the de¬ 
sired accuracy. We use the Matlab code inv{A) to compute A~^ directly and then 
retain the nnz{M) largest entries of A~^ in magnitude. Figure [T] depicts the patterns 
of M and A~^ that drops its small nonzero entries. We see that the pattern of M 
matches that of A~^ quite well, demonstrating that the RSAI(to/) algorithm can in¬ 
deed capture an effective approximate sparsity pattern of A~^. Importantly, it is clear 
from the figure that the sparsity patterns of M and A~^ are irregular and the numbers 
of large entries in the columns and rows of A~^ vary greatly, although the matrix or- 
sirr_2 itself is regular sparse. As we have counted, the number of irregular columns in 
A~^ is 63, each of which has more than \QVinAMl nonzero entries. It means that about 
8% columns in A~^ are irregular sparse. This confirms the fact addressed in 28|] that 
a good sparse approximate inverse of a regular sparse matrix can be irregular sparse. 
Such fact also implies that SPAI cannot guarantee to capture an effective approximate 
sparse pattern even when A is regular sparse and thus may be less effective for precon¬ 
ditioning an regular sparse problem (HH) since all the columns of M constructed by 
SPAI are sparse and each of them has at most I -|- /a/max nonzero entries, where la is 
the number of most profitable indices added at each loop. 



nnz = 26083 



0 200 400 600 800 

nnz = 26083 


Figure 1: orsirr_2: left: the pattern of M; right: the sparsity pattern of A 

Now we take some difficult problems from Table [T] which make BiCGStab without 
preconditioning fail to converge within 1000 iterations. We precondition these problems 
by RSAI(to/). The preconditioners are all computed by setting e = 0.4, c = 3 and 
/max = 10. Table [2] lists the results obtained. 

We observe that the RSAI(/o/) algorithm is effective to precondition these linear 
systems and accelerates the convergence of BiCGStab dramatically in all cases, as iter 
indicates. At the same time, we also find that the cost of constructing M is dominant. 
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Table 2: The RSAI(io/) algorithm for difficult problems 


matrices 

spar 

ptime 

Uc 

iter 

stime 

orsirr_l 

2.14 

0.18 

0 

29 

0.01 

orsirr_2 

2.14 

0.14 

0 

28 

0.01 

sherman2 

2.76 

1.59 

0 

6 

0.01 

shermanS 

1.15 

0.32 

0 

38 

0.02 

saylr4 

1.22 

0.92 

0 

672 

0.49 

ex36 

2.40 

4.82 

10 

90 

0.11 

e20r0100 

2.77 

20.42 

148 

148 

0.45 

raefskyS 

1.46 

104.70 

0 

206 

4.78 


This is similar to SPAI and PSAI as well as all the other non-factorized and factorized 
sparse approximate inverse preconditioning procedures [1, S]. For all the problems, we 
see that the ric are equal to zero except for ex36 and e20r0100, for which Uc is very 
small relative to n. This means that for given parameters the RSAI(to/) algorithm 
generally construct effective preconditioners, as also reflected by the iter, which are 
much smaller than 1000. Therefore, we conclude from Table [2] that the RSAI(to/) 
algorithm is generally effective for preconditioning (|l.ll) . 

Next we vary the stopping criterion e to see how it affects the sparsity of M and 
convergence of BiCGStab. We still set c = 3 and /max = 10. Table [3] reports the 
results obtained for shermanl, saylr4 and orsirr_2. As expected, BiCGStab used fewer 
iterations in all cases and M becomes denser as e decreases. However, we find that, 
for the three problems, though a smaller e makes BiCGStab converge faster, it is more 
costly to construct a denser M. Compared with the cost of constructing M, the cost 
of applying M and solving the preconditioned systems by BiCGStab is negligible for 
general problems, provided that M is an effective preconditioner. The choice of e = 0.4 
is the best as far as the total costs are concerned. The table also implies that /max = 10 
is conservative for the four given e since actual loops used does not achieve it and 
whether or RSAI(to/) terminated is up to e for the test problems. 


Table 3; The results for shermanl, saylr4 and orsirr_2 with different e 


shermanl: 

C — 3 S^nd. ^max 

= 10 

e 

spar 

ptime 

iter 

stime 

0.4 

2.04 

0.06 

29 

0.01 

0.3 

2.38 

0.07 

29 

0.01 

0.2 

3.96 

0.18 

18 

0.01 

0.1 

11.46 

1.83 

10 

0.01 


saylr4: c 

= 3 and 1 

max — 

10 

e 

spar 

ptime 

iter 

stime 

0.4 

1.22 

0.92 

672 

0.49 

0.3 

1.95 

1.90 

267 

0.21 

0.2 

3.08 

3.84 

85 

0.08 

0.1 

6.79 

12.50 

45 

0.05 

orsirr 2: c 

— 3 and ^max — 

= 10 

e 

spar 

ptime 

iter 

stime 

0.4 

2.14 

0.14 

28 

0.01 

0.3 

2.75 

0.21 

23 

0.01 

0.2 

4.36 

0.52 

16 

0.01 

0.1 

8.23 

2.35 

11 

0.01 


Finally, we vary c to investigate how it affects the sparsity of M and the convergence 
of BiCGStab. We set e = 0.4 and /max = 20, and take sherman2, ex36 and raefsky3 as 
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examples. Table S] lists the results. We find that the M gradually become denser but 
the CPU time of constructing them does not necessarily become more as c increases. 
This should be expected since at each loop the main cost of RSAI(to/) is the solutions 
of LS problems other than the ordering of entries of the current and picking up 
indices. By comparison, as far as the overall performance, measured by ptime, ric and 
iter, is concerned, we find that c = 3 may be a very best choice. 


Table 4: The results for sherman2, ex36 and raefsky3 with different c 



sherman2: e = 0.4 and /max 

= 20 

c 

spar 

ptime 

ric 

iter 

stime 

1 

1.99 

1.70 

2 

8 

0.01 

2 

2.13 

1.55 

0 

7 

0.01 

3 

2.76 

1.59 

0 

6 

0.01 

4 

3.22 

1.70 

0 

7 

0.01 

5 

3.55 

1.94 

0 

7 

0.01 


ex36: 

e = 0.4 and /, 

max — 

20 

c 

spar 

ptime 

ric 

iter 

stime 

1 

1.83 

7.12 

59 

103 

0.11 

2 

2.22 

5.75 

0 

85 

0.11 

3 

2.40 

4.86 

0 

91 

0.11 

4 

2.67 

4.94 

0 

83 

0.11 

5 

2.93 

5.00 

0 

86 

0.11 


raefsky3: e = 0.4 and /max 

= 20 

c 

spar 

ptime 

ric 

iter 

stime 

1 

1.17 

120.49 

0 

t 

— 

2 

1.31 

111.59 

0 

206 

4.20 

3 

1.46 

104.70 

0 

206 

4.78 

4 

1.59 

120.85 

0 

91 

2.03 

5 

1.74 

125.84 

0 

62 

1.46 


5.2 The RSAI(to/) algorithm versus the SPAI algorithm 

In this subsection, we compare the performance of RSAI(to/) and SPAI. For RSAI(to/) 
we take e = 0.3, the number c = 3 of the dominant indices exploited at each loop and 
^max = 10. For SPAI we take the same e and add three most profitable indices to the 
pattern of ruk at each loop for fe = 1, 2,..., n. We set / max = [10 x nnz{A)/{3 x n)J 
to control nnz{M)/nnz{A) < 5 for the SPAI algorithm, where [xj is the Gaussian 
function. Table 0 presents the results. 

From the table, we see that the M constructed by the RSAI(to/) algorithm are 
(almost) equally sparse as the counterparts by the SPAI algorithm except powersim. 
However, the RSAI(to/) algorithm uses substantially less CPU time and more efficient 
than the SPAI algorithm except powersim, especially when a given A is irregular sparse. 
For a dense column of A, the cardinalities of the corresponding jTfc used by SPAI are big 
during the loops, causing that it is time consuming to determine the most profitable 
indices added to the patterns of nik at each loop. In contrast, RSAI(to/) overcomes 
this shortcoming by quickly finding new indices added at each loop and is considerably 
more efficient. For sherman2, raefsky3 and some others, we also find that the RSAI(to/) 
algorithm spends much less CPU timings than the SPAI algorithm when the given 
problem is relatively dense, i.e., the average number d of nonzero entries per column is 
relatively large. This is because SPAI spent much more time seeking the most profitable 
indices for d bigger than RSAI(to/). 
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Table 5: The RSAI(to/) algorithm versus the SPAI algorithm 


matrices 


The RSAI(to/) algorithm 


The SPAI algorithm 


Constructing M 

BiCGStab 

Constructing M 

BiCGStab 

spar 

ptime 

Tic 

iter 

stime 

spar 

ptime 

Tic 

iter 

stime 

fs_541_3 

1.52 

0.30 

1 

16 

0.01 

1.45 

0.49 

6 

18 

0.01 

orsirr_l 

2.67 

0.23 

0 

24 

0.01 

1.58 

0.38 

2 

29 

0.01 

orsirr_2 

2.75 

0.21 

0 

23 

0.01 

1.63 

0.36 

2 

26 

0.01 

shermanl 

2.38 

0.07 

0 

29 

0.01 

1.72 

0.12 

7 

31 

0.01 

sherman2 

2.97 

2.19 

1 

5 

0.01 

2.69 

66.7 

102 

t 

— 

shermanh 

1.65 

0.50 

0 

30 

0.02 

1.18 

2.00 

3 

34 

0.02 

saylr4 

1.95 

1.90 

0 

267 

0.21 

1.97 

2.09 

15 

271 

0.20 

cavity 11 

3.13 

45.9 

243 

172 

0.28 

2.79 

635.8 

534 

215 

0.34 

ex36 

2.58 

6.45 

160 

77 

0.10 

1.63 

28.01 

191 

93 

0.10 

e20r0100 

2.90 

26.21 

413 

149 

0.47 

2.98 

980.3 

895 

212 

0.68 

e30r0000 

2.79 

76.8 

916 

168 

1.28 

3.09 

2727 

2606 

211 

1.69 

e40r0100 

2.90 

138.7 

1664 

425 

7.99 

3.06 

5181 

4649 

477 

6.70 

power sim 

4.92 

24.2 

1277 

839 

3.02 

2.63 

21.7 

1414 

t 

— 

raefsky3 

2.31 

355 

0 

75 

2.12 

1.10 

10655 

94 

t 

— 

scircuit 

2.82 

1283 

4 

849 

46.41 

2.19 

21134 

7535 

t 

- 



For the preconditioning effectiveness, we observe that the ric in RSAI(toZ) are con¬ 
siderably smaller than those in SPAI in all cases, especially when A is irregular sparse. 
The reason is that, as we have explained, RSAI(to/) is more effective to capture a good 
approximate sparsity pattern of A~^ than SPAI. Our results illustrate that with the 
roughly same number of nonzero entries allowed, the M by RSAI(to/) are more effective 
than their counterparts by SPAI, that is, RSAI(to/) indeed captures the positions of 
large entries more reliably than SPAI does, and the latter may retain positions of some 
small nonzero entries but misses some large entries. As a result, the M generated by 
SPAI are less effective than those by RSAI(to/). This is also confirmed by the number 
iter of iterations, where the iter preconditioned by RSAI(to/) are considerably fewer 
than those by SPAI for all the test problems. Particularly, for sherman2, powersim, 
raefskyS and scircuit, SPAI fails to make BiCGStab converge within 1000 iterations, 
while RSAI(to/) works well, and this is even true for the regular sparse matrix sherman2 
and raefskyS. Actually, for powersim and scircuit, BiCGStab preconditioned by SPAI 
reduces the relative residual to the level of 10“^ after several iterations, and afterwards 
it does not decrease further. 

In view of the above, the RSAI(to/) algorithm is at least competitive with and can 
be considerably more efficient and effective than the SPAI algorithm, especially for the 
problems where A is irregular sparse or has relatively more nonzero entries. 

5.3 The RSAI(to/) algorithm versus the PSAI(to/) algorithm 

Keeping in mind the results of Section we now compare the RSAI(tol) algorithm 
with the PSAI(to/) algorithm proposed in |29|| and improved in [^. We also take 
e = 0.3 and /max = 10 for PSAI(to/). Table [6] reports the results obtained by PSAI(to/). 


Table 6: The results by the PSAI(to/) algorithm 


matrices 

Preconditioning 

BiCGStab 

spar 

ptime 

ric 

iter 

stime 

fs.541.3 

1.87 

0.13 

0 

5 

0.01 

orsirr_l 

5.36 

1.69 

0 

25 

0.01 

orsirr_2 

5.66 

1.55 

0 

23 

0.01 

shermanl 

2.89 

0.12 

0 

27 

0.01 

sherman2 

2.74 

1.59 

0 

4 

0.01 

shermanS 

1.57 

0.46 

0 

29 

0.02 

saylr4 

1.86 

12.26 

0 

307 

0.26 

cavity 11 

11.84 

155.9 

0 

55 

0.35 

ex36 

6.45 

13.50 

0 

55 

0.12 

e20r0100 

9.44 

177.4 

0 

91 

0.79 

e30r0000 

13.19 

1670 

29 

t 

— 

e40r0100 

18.38 

8779 

494 

I 

— 

powersim 

9.91 

72.9 

399 

52 

0.41 

raefsky3 

5.03 

1281 

0 

63 

3.19 


We observe that the ric by the PSAI(to/) algorithm are no more than those by the 
RSAI(toZ) algorithm for all the problems. Actually, PSAI(to/) obtains the M with the 
desired accuracy satisfied for all the problems except eSOrOOOO, edOrOlOO and powersim. 
This means that PSAI(to/) can construct effective preconditioners for most general 
sparse problems. We do not list the test problem scircuit in the table because PSAI(to/) 
was found out of memory, which is due to the occurrence of some large sized LS problems 
during the process. We refer the reader to [^, for explanations on some typical 
features of PSAI(to/). Compared with RSAI(to/), it is seen that, except powersim. 
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eSOrOOOO and edOrOlOO, for the other test problems, there is no obvious winner between 
it and PSAI(toZ) in term of iter. Moreover, the setup time of M by two algorithms 
are also comparable for regular problems. For some problem which has relatively more 
nonzero entries, such as raefskyS and so on, the setup time of M by PSAI(foZ) is more 
than those by RSAI(toZ). This is because we get a denser M by PSAI(toZ). However, it 
is more effective than those by RSAI(toZ) in term of Uc- Based on these observations, 
we conclude that the RSAI(toZ) algorithm is as comparably effective as the PSAI(foZ) 
algorithm. 


6 Conclusions 


SPAI may be costly to seek approximate sparsity patterns of A~^ and ineffective for 
preconditioning a large sparse linear system, which is especially true when A is irregular 
sparse or is not very sparse. To this end, we have proposed a basic RSAI algorithm 
that only uses the dominant information on residual and can adaptively determine 
a good approximate sparsity pattern of A~^. We have derived an estimate for the 
number of nonzero entries of M. In order to control the sparsity of M and improve 
efficiency, we have developed a practical RSAI(foZ) algorithm with the robust adaptive 
dropping strategy [2^ exploited. We have tested a number of real-world problems to 
illustrate the efficiency and preconditioning effectiveness of RSAI(toZ). Meanwhile, we 
have numerically compared it with SPAI and PSAI(toZ), showing that RSAI(to/) is at 
least competitive with and can be substantially more efficient and effective than SPAI, 
and it is also comparably effective as PSAI(toZ). 

Some other research is significant. As we have seen, RSAI(foZ) may be costly for 
A row irregular sparse, which is also the case for SPAI. In order to make RSAI(toZ) 
efficient for both column and row irregular sparse problems, we have exploited the 
idea proposed in 28| to transform such a problem into certain regular ones, so that 
RSAI(toZ) can be much more efficient to construct effective preconditioners 26l |. 
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